Skip to content

TW-4877: webhook server preflight UX + security hardening#63

Merged
qasim-nylas merged 10 commits intomainfrom
feature/TW-4877-webhook-preflight-hardening
Apr 29, 2026
Merged

TW-4877: webhook server preflight UX + security hardening#63
qasim-nylas merged 10 commits intomainfrom
feature/TW-4877-webhook-preflight-hardening

Conversation

@qasim-nylas
Copy link
Copy Markdown
Collaborator

@qasim-nylas qasim-nylas commented Apr 27, 2026

Summary

  • Interactive preflight on nylas webhook server — detects cloudflared, offers brew install on macOS, prompts to enable tunnel, reads secret with echo disabled.
  • New --no-tunnel flag; loopback-only output no longer instructs user to register localhost with Nylas.
  • Webhookserver hardening: 1 MiB body cap, replay-protection (MaxEventAge), bounded handler goroutines, events_dropped in /health.
  • Crypto: Argon2id t=1t=3, 12-char passphrase floor, derived keys zeroed.
  • XSS fix in Air AI summary modal; scheme allow-list on notetaker-open-external.
  • Silent-failure cleanup in pattern_learner, doctor_checks, base_client.go, requestLocation, admin.go pagination.

Test plan

  • make ci green (build, vet, lint, race, security, govulncheck)
  • nylas webhook server (TTY, no flags) prompts for tunnel; Ctrl-D exits cleanly
  • nylas webhook server --no-tunnel runs loopback-only without prompt
  • Webhook receiver returns 413 on >1 MiB bodies
  • Air UI loads; AI summary modal renders without XSS

Originating issue: nylas webhook server displayed a localhost URL and told
the user to register it with Nylas — but Nylas can't reach localhost, so
events never fired (Slack #cli, 2026-04-27).

Webhook preflight (the originating issue)
- Interactive preflight when neither --tunnel nor --no-tunnel is set:
  detect cloudflared, offer brew install on macOS, prompt to enable
  tunnel, read secret with terminal echo disabled.
- New --no-tunnel flag for non-interactive opt-out.
- Loopback-only output no longer instructs user to register localhost.
- Prompter interface so EOF (Ctrl-D) propagates cleanly instead of
  silently flipping into --allow-unsigned or auto-running brew install.

Webhook server hardening (internal/adapters/webhookserver)
- 1 MiB request body cap (http.MaxBytesReader).
- Replay protection via MaxEventAge (CloudEvents `time` skew check).
- Goroutine fanout bounded by 32-slot semaphore + per-handler recover().
- events_dropped surfaced in /health.
- Event-display goroutine exits on ctx.Done() and recovers from panics.

Other silent-failure fixes uncovered in review
- pattern_learner: surface per-calendar errors instead of silent skip.
- doctor_checks: propagate config/secret-store failures explicitly.
- base_client.go: error on malformed tool-call JSON.
- requestLocation: return error on bad timezone instead of UTC fallback.
- admin.go: cycle guard + maxGrantPages ceiling on cursor pagination.

Crypto hardening (internal/adapters/keyring)
- Argon2id raised from t=1 to t=3.
- NYLAS_FILE_STORE_PASSPHRASE minimum length 12.
- Derived AES keys zeroed after use.

Frontend (Air)
- XSS fix: AI summary sentiment/category now escapeHtml-ed.
- notetaker-open-external: scheme allow-list (https?://) + noopener.

Tests
- New unit tests for preflightTunnelChoice (mocked Prompter).
- New webhookserver tests: oversized body 413, replay window, loopback
  bind, events_dropped in /health.
- webguard middleware: Referer-only fallback path.
- All packages pass go test -race; golangci-lint clean.
- Air calendar event card: card-level click handler now bails when the
  click originates inside a [data-action] element or <a href>, so the
  join-meeting link no longer also opens the Edit Event modal. Both
  listeners live on document, so stopPropagation can't help — siblings,
  not parent/child.
- UI accessibility: scope the nav-buttons assertions to the navigation
  landmark and use exact-match names so dashboard account buttons like
  "ACCOUNT 23@qasim.nylas.email" can't satisfy the broader regex.
1. EncryptedFileStore.Get migration race: Get held only RLock while
   loadSecrets→decrypt could run migrateToPassphrase, which writes BOTH
   .secrets.salt and .secrets.enc. Two concurrent first-readers could
   interleave those writes and leave a salt/ciphertext pair that no
   longer decrypts. Get now takes the exclusive Lock. Adds a
   concurrency test (32 goroutines, race detector clean) that re-opens
   the store from a cold start and verifies round-trip after migration.

2. doctor checkGrants treated client_secret as required, but the rest
   of the codebase (auth/config.go, auth/show.go, auth/helpers.go,
   auth/migrate.go) treats it as optional. Now only API key + client
   ID are required; ErrSecretNotFound on client_secret is treated as
   empty. Real keyring read failures still surface explicitly.

3. preflightTunnelChoice TTY check was hard-wired to term.IsTerminal,
   so the EOF-on-secret and empty-secret tests were permanently
   skipped via an isTerminalStdin() helper that always returned false.
   The function now takes an `interactive bool` parameter (runServer
   passes term.IsTerminal; tests pass true). Cloudflared/brew probes
   are now package-level seams so tests can drive the secret-prompt
   path without a real binary on PATH. Added a new positive-path test
   for explicit unsigned-confirm.
The /v3/grants LIST endpoint can lag freshly-created managed grants by
tens of seconds (>70s observed) while GET-by-id is strongly consistent.
Tests that relied on list-based visibility checks were hitting the
90-second deadline intermittently.

Adapter (managed_grants.go):
- Drop the server-side `provider=nylas` filter; LIST without filter
  surfaces fresh grants almost immediately. Filter to provider
  client-side. Trades ~4x more page bytes for freshness.
- Add cycle detection on next_cursor and a 1000-page cap so a
  misbehaving server can't trap pagination in an unbounded loop.

CLI (agent_scope.go + helpers.go + create.go + list.go +
policy_list_get.go):
- loadAgentPolicyScope upserts the configured default agent (via
  NYLAS_GRANT_ID or default grant) into the listed accounts, and
  back-fills any missing policy via GetPolicy. Rule and policy CLI
  surfaces now succeed even when the LIST hasn't propagated.
- Apply the same GET-by-id fallback to resolveAgentID (email->ID),
  findExistingAgentAccountByEmail, and runList rendering.

Integration tests:
- New waitForAgentByID helper using the consistent GET endpoint.
- Cleanups simplified to use the create-returned ID directly.
- Set NYLAS_GRANT_ID immediately after create so subsequent CLI calls
  hit the GET-by-id fallback.

File splits to honor the 500-line ideal limit:
- agent_scope.go: extracted policy-scope helpers + agentPolicyScope
  type (shared by rule.go and policy_list_get.go).
- rule_print.go: extracted rule formatting helpers from rule.go.
- agent_scope_test.go and rule_print_test.go: mirror the source split.

Verification: 24/24 agent integration tests PASS in 294s (previously
6 timed out at 90s); unit tests, build, vet, lint all green.
@qasim-nylas qasim-nylas requested a review from sourcesoft April 28, 2026 23:25
quzhi1
quzhi1 previously approved these changes Apr 28, 2026
@qasim-nylas qasim-nylas merged commit 81a4377 into main Apr 29, 2026
6 checks passed
@qasim-nylas qasim-nylas deleted the feature/TW-4877-webhook-preflight-hardening branch April 29, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants